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ABSTRACT 

A plant cytosine methyltransferase cDNA was isolated 
using degenerate oligonucleotides, based on homology 
between prokaryote and mouse methyltransferases, 
and PCR to amplify a short fragment of a methyltrans- 
ferase gene. A fragment of the predicted size was 
amplified from genomic DNA from Arabidopsis thaliana. 
Overlapping cDNA clones, some with homology to the 
PCR amplified fragment, were identified and 
sequenced. The assembled nucleic acid sequence is 
4720 bp and encodes a protein of 1534 amino acids 
which has significant homology to prokaryote and 
mammalian cytosine methyltransferases. Like 
mammalian methylases, this enzyme has a C terminal 
methyltransferase domain linked to a second larger 
domain. The Arabidopsis methylase has eight of the 
ten conserved sequence motifs found in prokaryote 
cytosine-5 methyltransferases and shows 50% 
homology to the murine enzyme in the methyltrans- 
ferase domain. The amino terminal domain is only 24% 
homologous to the murine enzyme and lacks the zinc 
binding region that has been found in methyltransfer- 
ases from both mouse and man. In contrast to mouse 
where a single methyltransferase gene has been 
identified, a small multigene family with homology to 
the region amplified in PCR has been identified in 
Arabidopsis thaliana. 



INTRODUCTION 

The most common modification of DNA in higher eukaryotes 
is methylation of cytosine residues at carbon 5. In vertebrates, 
3 -8% of cytosines are methylated (1) while in plants up to 30% 
of cytosines are modified (2). The difference in extent of cytosine 
methylation between vertebrates and plants can be attributed to 
two factors. DNA methylation in animals is generally confined 
to cytosines in CG dinucleotides while in plants methylation 
occurs at cytosines located in both CG dinucleotides and CNG 
triplets, where N is any base (3). In addition the CG dinucleotide 
is more common in DNA of plants than of animals. DNA 
methylation has been implicated in regulating gene expression 



during development, in determining chromatin structure and in 
compartmentalization of DNA (reviewed in 4, 5, 6). 

Methyl groups are transferred to cytosine residues from 
S-adenosyl methionine in a reaction catalysed by a DNA 
methyltransferase or methylase (7). Prokaryote cytosine 
methyltransferases generally methylate cytosines within a longer 
target sequence, while mammalian methyltransferases methylate 
cytosine residues in any CG dinucleotide. Prokaryote cytosine 
methyltransferases are structurally similar with highly conserved 
motifs alternating with less well conserved sequences (8, 9, 10). 
The presence of these conserved motifs, designated I to X, which 
occur in the same order in all these enzymes differentiates 
cytosine-5 methyltransferases from cytosine-N4 and adenine-N6 
methylases (9). The cysteine residue of a highly conserved 
proline-cysteine doublet (motif IV, PCXXXS) forms the active- 
site (11, 12, 13, 14), while motif I contains a sequence that is 
thought to bind S-adenosyl methionine (F/GXGXG, 15). The 
target recognition domain which specifies both the target sequence 
and the base to be methylated lies in the variable region between 
motifs VIE and IX (12, 16, 17, 18). 

Mammalian cytosine methyltransferases are comprised of two 
protein domains which fold independently, a C terminal 
methyltransferase domain which is structurally similar to that of 
prokaryote methylases, fused to a second large domain (19, 20). 
A single DNA methyltransferase gene has been detected in the 
mouse which is consistent with the finding that there is a single 
species of methylase in different cell types which differ in the 
pattern of DNA methylation (19). Partial purification of a 
methyltransferase enzyme from pea (21, 22), wheat (23) and rice 
(24) has been reported but no plant methyltransferase genes have 
been cloned. Plants differ from animals in both the extent and 
sequence specificity of methylation. It is not known whether plant 
methyltransferases resemble the family of prokaryote enzymes 
each of which recognizes several target sites, or whether plants 
have multiple methyltransferases which catalyse methylation at 
CG and CNG respectively. 

We have cloned a cytosine methyltransferase gene from 
Arabidopsis thaliana. We used homology between the mouse 
methyltransferase and prokaryote methylases to design primers 
for PCR amplification of a fragment spanning conserved motifs 
IX and X. Taking this approach we have isolated overlapping 
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clones which make up a full length cDNA for a methyltransferase 
gene from Arabidopsis. The inferred amino acid sequence of this 
composite cDNA shows homology to cytosine methyltransferases 
in the C terminal domain. A family of genes, with homology 
to the amplified region, has been identified in a Southern analysis . 



MATERIALS AND METHODS 

PCR amplification of fragment of a DNA methylase 

Primers for PCR were MMetl (CCGAATTCCAG/ 
AGGNTTT/CCCNGAC/T) for region DC and MMet2 (CGGG- 
ATCCACNGCA/GTTNCCNACC/TTG) for region X 
(Figure la) where N represents a mix of all four bases. 
Restriction endonucleases recognition sites are indicated in bold. 

The final reaction conditions for PCR were 250ng genomic 
DNA or lng linearized plasmid DNA, IfM each primer, 200*iM 
each dNTP,10mM Tris pH8.8 at 25°C, 1.5mM MgCl 2 > 50mM 
KC1, 0.1% Triton X-100 and 1 unit Taq DNA polymerase in 
a reaction volume of 2S/d. Cycling conditions included an initial 
denaturation step at 95 °C for 5 minutes followed by 30 cycles 
of 46°C for 30 seconds, 65°C for 30 seconds and 95°C for 30 
seconds. The final cycle was 46°C, 30 seconds and 65°C for 
2 minutes. Reaction products were separated on an 8% 
polyacrylamide gel and amplified fragments were isolated from 
the gel, digested with BamHl and EcoRl, then cloned into pUC 19. 
Six independent clones were sequenced to eliminate any errors 
introduced during PCR amplification. 

Primers specific for sequences towards the 5'end of clone Yc8 
were used to amplify the region spanning the overlap between 
cDNAs Yc8 and Yc2 (Figure 2). The primer AMetl (CCTA- 
GACTCTCACCATCCC) was used to prime the reverse 
transcription of total RNA, the products were tailed in the 
presence of dATP and then amplified by PCR using AMetl and 
an oligodT primer containing recognition sites for the enzymes 
EcoRl, Smal and BamHl at the 5' end. The nested primer AMet2 
(GCGGATCCTTCCAGAACTGCCTCGG) was used in a 
subsequent PCR ainplification of 1/d of the initial PCR mix with 
the same oligodT primer (25). Products of this amplification were 
gel purified, cleaved with BamHl and cloned into BamHl cleaved 
pUC19 for sequencing. 

Screening of X genomic and cDN A libraries 

The cloned, PCR amplified fragment was gel purified then 
random primed in the presence of both dATP 32 and dCTP 32 
(Oliglabelling Kit, Bresatech). A genomic library (Promega) 
containing A.thaliana Landsberg DNA, partially digested with 
Afbol, was screened with this probe and positive plaques 
identified. An EcdRL fragment, that hybridized to the probe, was 
subcloned from a positive plaque for further mapping and 
sequence analysis. A 400bp HincWEcoW fragment which 
encompassed the amplified region was identified and used for 
screening a cDNA library (Landsberg, Promega). Three 
overlapping cDNA clones were identified in this way. Screening 
of a second cDNA library (Columbia with low percent 
Landsberg, J. Mulligan, pers comm., 26) with the cDNA probe 
Pc2 (Figure 2), identified more overlapping cDNA clones. 

Sequencing and sequence analysis 

Templates for sequencing were derived by subcloning and by 
generating several series of nested deletions by ExoJH deletion 



(ExoTR nested deletion kit, Pharmacia). Sequencing was done 
in the presence of radiolabeled dATP using T 7 polymerase (T 7 
sequencing kit, Pharmacia), or with fluorescent dye labelled 
primers and Taq DNA polymerase (Taq dye primer cycle 
sequencing kit, Applied Biosy stems). Sequences were obtained 
for both strands of at least one cDN A clone, and in some instances 
were confirmed by sequencing one strand of an overlapping 
clone. Sequencing reactions using fluorescent dye primers were 
resolved on an Applied Biosystems 370A DNA Sequencer. 
Nucleotide and amino acid comparisons were done using GCG7 . 1 
sequence analysis package (27). 

Southern hybridization 

The probes used for Southern analysis were the PCR amplified 
fragment, spanning conserved regions IX and X (see Results) 
and a 398bp fragment from cDNA clone Pc2 that encompasses 
the region amplified in PCR (71bp), flanked by coding DNA 
of 84 bp 5' and 78bp 3' and 165bp 3' untranslated (probe 1, 
Figure 2). DNA used in Southern analyses was isolated from 
ecotype Landsberg and the hybridization procedure has been 
described previously (28). 

RESULTS 

PCR amplification of part of a methyltransferase gene from 

Arabidopsis 

The methyltransferase domain of mammalian methylases retains 
eight of the ten conserved sequence motifs that are characteristic 
of prokaryote cytosine methyltransferases (29, 9). A comparison 
of the amino acid sequence for the mouse methylase (19) with 
methylases M.Hhal (30), M.Ddel (31) and M.EcdRR (8) for 
motifs IX and X revealed regions of homology with low codon 
degeneracy (Figure la). Degenerate oligonucleotides, corres- 
ponding to the mouse amino acid sequence in these regions, were 
designed to prime amplification of the short variable region 
between motifs IX and X, that is 71 nucleotides in the mouse 
gene. Restriction endonuclease recognition sites were included 
at the 5' end of each primer to facilitate cloning which increased 
the expected length of a fragment amplified from a 
methyltransferase gene to 87 bp. 

These primers were used in PCR with genomic DNA template 
from a number of plant species but a band of the predicted size 
(87 bp) was amplified only from Arabidopsis DNA (Figure lb), 
possibly because of its small genome size. This fragment was 
cloned and six independent isolates sequenced. The sequence of 
five clones was identical while the remaining clone had eight bases 
inserted between one primer and the rest of the sequence, which 
was identical to that in the other clones. This was the only clone 
out of 24 examined that contained a larger insert and may be 
an artefact of PCR amplification. 

The deduced amino acid sequence, excluding residues encoded 
by the primers, was 54% identical (7/13) to the mouse 
methyltransferase in this region (Figure lc) suggesting that the 
fragment amplified represents a fragment of a plant 
methyltransferase gene. A number of other bands were also 
amplified from the Arabidopsis DNA template (Figure lb). 
Amplification of only two of these was dependent upon the 
addition of both primers to the reaction mix and sequence analysis 
showed that these fragments were not homologous to the mouse 
methyltransferase. The remaining bands were amplified when 
only one primer, either MMetl or MMet2, was included in the 
reaction mix; these bands were not characterized. 



Nucleic Acids Research, 1993, Vol. 21, No. 10 2385 



a 

Motif IX Motif X 

mouse HRVVSVRECARS QGFPO GNI LDRHR QVGNAV PPPLPKPLAW 

M.Dde\ NRNFTAREGARIQSFPD EKHLSQYQQIGNAVPPLLAQALAE 

M.Hha\ TRKLHPRECARVMGYPD P S T S Q A Y K Q F G N S V V I N V LQ Y I A Y 

M.EcoRW PRRLTPRECARLMGFEK V S D T Q S Y R Q F G N S V VV P V F E A V A K 




Figure la. Motifs DC and X from the mouse methyltransferase (19) and prokaryote 
methylases M.Hhal (30), U.Ddel (31) and M.EcoRH. (8) were compared. Strongly 
conserved regions with low codon degeneracy were selected for synthesis of 
oligonucleotide primers for PCR. The sequences represented in the primers MMetl 
and MMet2 are underlined, b. Acrylamide gel showing the products of PCR 
amplification using primers MMetl and MMet2 with genomic DNA template 
from pea (lane 1), flax (lane 2), cotton (lane 3), Arabidopsis (lane 4) and a plasmid 
containing the mouse methyltransferase cDNA (lane 5). The marker (lane 6) is 
pUC19 cut with HpaU. The band indicated with an arrow was cloned and 
sequenced, c. Comparison of the deduced amino acid sequence of the fragment 
amplified from Arabidopsis DNA in PCR and the corresponding region of the 
mouse methylase. The sequences represented by the primers are in bold. 



Cloning of an Arabidopsis methyltransferase cDNA 

The 87 bp PCR amplified product was used to screen a genomic 
library and positive plaques identified. Sequence analysis of one 
clone, using the primers MMetl and MMet2, showed that the 
nucleotide sequence was identical to that amplified in PCR. A 
HincU-EcoKL fragment (approximately 400 bp), encompassing 
the amplified product, was isolated from this clone and used to 
screen a cDNA library. Three cDNA clones ranging in size from 
about 300bp to 700bp were identified among approximately 
160,000 screened. The largest of these clones, Pc2 (Figure 2), 
was used to screen a second cDNA library (26). Four positive 
clones were isolated and the 5' end of the longest clone, Yc8 
(probe 2, Figure 2), was used to rescreen this library to isolate 
clone Yc2, which extends beyond the end of the coding region 
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Figure 2. Location of the overlapping cDNA clones and the PCR fragment, used 
to verify the overlap between clones Yc8 and Yc2, with respect to the 
methyltransferase and amino terminal domains. The thin lines indicate the regions 
that were sequenced, the arrowheads indicate the direction of sequencing. The 
location of probes is also indicated; probe 1 is referred to as the 398 bp probe 
from Pc2, probe 2 as the 5' end of clone Yc8 and probe 3 as the Yc2 probe. 

(Figure 2). Clones Pc2, Yc8 and Yc2 were completely sequenced 
on both strands, while only 300-400 bases at each end of clones 
Yc7 and Yc21 were sequenced (Figure 2). 

Clones Yc7 and Yc8 share a common restriction map and the 
sequence of the coding regions of clones Pc2, Yc8, Yc7 and 
Yc21, where sequenced (Figure 2), are identical indicating that 
these clones are derived from the same gene. The 3' untranslated 
regions of clones Pc2, Yc7 and Yc21 are also identical in 
sequence but differ in length preceding the poly A tail. The 
sequence of clone Yc8 diverges from that of the other clones 
49 bp beyond the stop codon. This sequence difference may arise 
because these clones are from different alleles or may reflect a 
difference between ecotypes Columbia and Landsberg, both 
represented in the XYES library (J. Mulligan, pers comm.). 

The overlap between Yc8 and Yc2 was confirmed by 
sequencing 4 independent isolates of a fragment amplified from 
the products of a first strand cDNA synthesis using nested 
methylase specific primers (Figure 2). Additional confirmation 
that these clones are derived from the same gene comes from 
the isolation of a genomic clone that hybridizes to both Pc2 and 
Yc2 (probes 1 and 3, Figure 2). The length of the 
methyltransferase cDNA assembled from the overlapping cDNA 
clones Yc8 and Yc2 is 4720bp not including a poly A tail 
(Accession No. L10692), which agrees with the estimate based 
on Northern analysis of 4.7kb (data not shown). 

The assembled nucleotide sequence encodes an open reading 
frame of 1534 amino acids, comparable in length to the murine 
enzyme (1587 aa). There is an inframe stop codon 66 bases 
upstream of the first methionine. The cDNA differs from the 
PCR amplified product (70% homology at the amino acid level) 
in both the region between the primers and in one amino acid 
of the priming site in region X, resulting in a mismatch at the 
fourth base from the 3' end of primer MMet2. This mismatch 
could account for the failure to detect this sequence amongst the 
six PCR clones that were originally characterized. The difference 
in sequence between the PCR amplified product and the cDNA 
clones suggests that they may represent different genes. 

Sequence comparison with the mouse methyltransferase 

The inferred amino acid sequences of the Arabidopsis and mouse 
enzymes are 50% homologous in the C terminal 
methyltransferase domain (Figure 3). The eight motifs conserved 
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Figure 3. A dotplot comparison between the mouse (horizontal axis) and 
Arabidopsis (vertical axis) methyltransferase proteins. The window size is 30 and 
stringency of the match is 18. The conserved sequence motifs in the 
methyltransferase domain are indicated by numbers I to X beside the diagonal. 
The arrow head indicates the location of the zinc binding region in the mouse 
enzyme. 



MOUSE: 
HUMAN: 
ARABIDOPSIS: 

PROKARYOTE CONSENSUS: 



EMLCGGPPCQGFSGMN 
EMLCGGPPCQGFSGMN 
DFINGGPPCQGFSGMN 



D- 
N- 



-G-PCP-FS--G 
Q-W 



Figure 4. Comparison of conserved motif IV which contains the active cysteine 
residue in cytosine-5 prokaryote methyltransferases. In the prokaryote consensus 
sequence alternate amino acids are listed one above the other and variable residues 
are indicated by a dash. 



in all eukaryote and prokaryote cytosine methyltransferases are 
present in the same order in the plant methylase as in both 
prokaryote and the other eukaryote enzymes (Figures 3). A 
proline-cysteine doublet present in conserved motif IV has been 
identified as the functional catalytic domain in prokaryotic 
cytosine-5 methyltransferases (11, 12, 13, 14). This motif is 
highly conserved between the prokaryote, mammalian and plant 
enzymes (Figure 4), suggesting that the prolyl-cysteinyl doublet 
may also be the catalytic site in eukaryote enzymes. The S- 
adenosyl methionine binding domain is also conserved in the plant 
enzyme (motif I, Figure 3). 

The variable region between conserved motifs VIE and IX 
determines the sequence specificity of methylation in the 
prokaryote methylases (12, 16, 17, 18, 32). Prokaryote 
methyltransferases that recognize identical or similar target 
sequences have homology in this region while enzymes 
recognizing different targets show little or no homology (9, 33, 
34). The mouse and Arabidopsis proteins also have homology 
in this region, but it does not extend the full length of the target 
recognition domain (Figure 3). In addition there is a deletion of 



Figure 5a. Southern hybridization of Arabidopsis DNA cleaved with enzymes 
as indicated and probed with the PCR amplified fragment that encodes 71 bases 
of coding sequence for part of conserved motifs IX and X and the variable region 
between these motifs, b. Southern hybridization of the same filter as shown in 
Figure 5a, probed with a 398 bp fragment of cDNA Pc2 that has homology to 
the PCR amplified fragment, plus 84 bp (5') and 78 bp (3') coding sequence 
and 165 bp of 3' untranslated sequence (probe 1, Figure 2). 



44aa from the Arabidopsis protein towards the C terminal end 
of this region. 

The amino terminal domain of the mouse and human enzymes 
is separated from the methyltransferase domain by 13 alternating 
lysine and glycine residues. These domains can be separated by 
proteolytic cleavage suggesting that they fold independently (20, 
35). The methyltransferase domain of the Arabidopsis protein 
is separated from the amino terminal domain by the sequence 
KKKGKG. While this differs from the corresponding sequence 
in the mammalian enzymes it is also lysine rich, suggesting that 
it may have some functional significance. 

The most striking feature of the amino terminal domain is the 
relative lack of homology (24%) between the mouse and 
Arabidopsis proteins (Figure 3). This contrasts to the mouse and 
human methyltransferases which are 70% identical in this domain 
compared to 83% in the methylase domain (29). Homology 
between the mouse and Arabidopsis proteins is limited to short 
stretches throughout this domain (Figure 3). One of these regions 
of homology lies between residues 300-450 in the mouse 
enzyme; residues 207-455 target the mouse methyltransferase 
to replication forks in S phase nuclei (36). A zinc binding domain, 
CX 2 CX 2 CX 4 CX 2 CX 2 CX 15 CX 4 C, has been identified within the 
N terminal domain of the mouse protein (35). This domain is 
conserved in the human enzyme (29), but does not occur in the 
Arabidopsis protein. An acidic region, which contains 16 glutamic 
acid and two aspartic acid residues in 36 aa (residues 656 to 692, 
Figure 3) found in the plant enzyme is not present in the 
mammalian enzymes. A sequence which resembles recognized 
nuclear localization signals (reviewed in 37) occurs near the 
amino terminus of the plant enzyme. 

Identification of a methyltransferase gene family in 
Arabidopsis 

Partial purification of the methyltransferase enzyme from pea has 
failed to identify two distinct methyltransferase functions, one 
specific for CG and a second for CNG (21, 22). However, in 
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tobacco, methylation of CG and CNG motifs occurring in 
repeated sequences showed different sensitivity to the inhibitor 
ethionine (38). While 5-azacytidine treatment resulted in 
demethylation of cytosines in both CG and CNG motifs, ethionine 
treatment caused marked demethylation of CNG triplets with little 
effect on CG methylation (38). Ethionine may alter the specificity 
of a single methyltransferase (38), or CG and CNG motifs may 
be methylated by separate enzymes which differ in their sensitivity 
to ethionine. 

Southern analyses of Arabidopsis DNA using the PCR 
amplified fragment (71bp coding sequence) showed a single band 
when DNA was cut with some enzymes (BgM and EcoKV) and 
two bands when DNA was cleaved with EcoRI, HindHl or Xhol 
(Figure 5a). This indicates that there are two copies of this region 
in the genome because the fragment used as a probe was amplified 
directly from genomic DNA and does not encode sites for the 
enzymes EcoRl, HindHl or Xhol. The presence of a single large 
band when DNA was cut with BglR and EcoRV suggests that 
these copies may be linked. We have identified a single Yac (39) 
that encodes both copies of this fragment on an 80kb fragment 
of Arabidopsis DNA which supports the idea of linkage (data 
not shown). 

The same blot was reprobed with a 398 bp fragment from 
cDNA clone Pc2 (Figure 5b). This probe encodes a region 86% 
homologous (over 71bp) to the PCR amplified region flanked 
by 84 bp (5') and 78 bp (30 of coding sequence and 165 bp of 
3' untranslated sequence including a poly A tail (18 b) (probe 1, 
Figure 2). At high stringency one strongly hybridizing band is 
seen in each lane plus a second band of much lower intensity; 
with the exception of DNA cut with BglR where there is one 
band in common, neither of these bands comigrates with the bands 
identified by the PCR probe. After extended exposure of the 
autoradiogram, or when the hybridization stringency was 
reduced, the bands identified by the PCR probe can be detected 
by hybridization to the cDNA probe (data not shown). This 
suggests that there is a small family of genes with homology to 
a DNA methyltransferase. 

DISCUSSION 

We suggest that the inferred amino acid sequence of the 
Arabidopsis protein described in this report is a cytosine 
methyltransferase based on its homology to both mammalian and 
prokaryotic cytosine-5 methyltransferases in the C terminal or 
methyltransferase domain (9, 19, 29). Like the mammalian 
enzymes, the Arabidopsis enzyme has eight of the ten regions 
characteristic of the prokaryote cytosine methyltransferases (9). 
Motifs I and IV, which have been identified as the S-adenosyl 
methionine binding domain and the active site respectively, are 
highly conserved between this Arabidopsis enzyme, both 
mammalian methyltransferases and all prokaryote cytosine 
methyltransferases. The presence of the 8 motifs, found only in 
cytosine-5 methyltransferases, is strong evidence that the 
Arabidopsis protein described here also functions as a DNA 
methyltransferase. 

The variable target recognition domain between motifs VIII 
and IX is less well conserved, between the plant and mammalian 
enzymes, than the remainder of the methylase domain. Homology 
between the mouse and Arabidopsis proteins in this region 
suggests that they may share a common target sequence, that is 
CG dinucleotides, but the observed differences makes this less 
than certain. 



In both mammalian enzymes, the N terminal domain is 
separated from the methyltransferase domain by a run of 
alternating lysine-glycine residues. There is a shorter glycine- 
lysine rich sequence separating the two domains in the 
Arabidopsis protein. Although the homology between the mouse 
and human proteins is somewhat lower in the N-terrninal domain 
than in the methyltransferase domain (70% compared to 83%), 
the two proteins are still highly conserved. In contrast the 
mammalian and plant enzymes show only 24% homology in the 
N-terminal domain; homologous regions are short and scattered 
throughout this domain. Perhaps significantly, the region 
(residues 207-455) that targets the mouse enzyme to the 
replication fork in S phase nuclei (36) shows homology to residues 
120-280 in the Arabidopsis protein, suggesting that the latter 
may also be located at the replication fork. While no function 
has been assigned to other regions of homology, conservation 
of these sequences between plants and mammals suggests that 
they may be essential for enzyme function. The most significant 
feature of the mammalian enzymes in this domain, a zinc binding 
region, is absent from the Arabidopsis protein. The motif 
S/TPXX, where X tends to be a basic amino acid, occurs 
frequently in regulatory proteins; these motifs bind in the minor 
groove of DNA with the narrower minor groove of AT rich DNA 
being the preferred binding site (40). The presence of this motif, 
which occurs ten times in the N terminal domain of the mouse 
methyltransferase (41) and five times in the plant enzyme, may 
indicate this domain is involved in binding DNA. 

The two protein domains of the mouse enzyme fold 
independently and can be separated by proteolytic cleavage. When 
the N terminal domain was cleaved from the methyltransferase 
domain the latter retained activity; separation of the two domains 
caused a large stimulation in the rate of de novo methylation, 
that is methylation of unmethylated DNA (35). The rate of 
methylation of a hemimethylated substrate was not significantly 
changed by separation of the two domains, suggesting that the 
amino terminal domain down regulates de novo methylation by 
the intact enzyme (35). It should now be possible to determine 
the function of the corresponding domain in the plant enzyme. 
Bacterial methylases, which have no counterpart to the amino 
terminal domain, show no discrimination between unmethylated 
and hemimethylated DNA. The mammalian methyltransferases 
may have arisen by fusion of two ancestral genes, one with 
methyltransferase activity and the other a sequence specific DNA 
binding protein (20, 35). The finding that the Arabidopsis 
methyltransferase lacks the zinc binding domain and shows only 
limited homology to the mouse protein in the amino terminal 
domain suggests that this domain has evolved more rapidly than 
the methyltransferase domain. Alternatively, gene fusion giving 
rise to the complex methyltransferase in eukaryotes may have 
occurred independently in the plant and animal kingdoms. 

In contrast to the mouse where a single methyltransferase gene 
has been detected (19), a small multigene family with homology 
to the region amplified in PCR (regions IX -X) has been 
identified in Arabidopsis DNA. At least two members of this 
family (described above), genes represented either by cDNA 
clones or by the PCR amplified product and its corresponding 
genomic clone, have greatest homology in a data base search 
to other cytosine-5 methyltransferases. This gene family may 
encode enzymes that differ in specificity of methylation, for 
example methylating cytosines in CG or CNG motifs, in the time 
during development at which they are expressed, or which are 
targeted to the chloroplast rather than the nucleus. Studies with 
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transgenic plants will clarify the function and regulation of these 
genes. 
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